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Abstract 

We describe an J^-statistic search for continuous gravitational waves from galactic white-dwarf 
binaries in simulated LISA Data. Our search method employs a hierarchical template-grid based 
exploration of the parameter space. In the first stage, candidate sources are identified in searches 
using different simulated laser signal combinations (known as TDI variables). Since each source 
generates a primary maximum near its true "Doppler parameters" (intrinsic frequency and sky 
position) as well as numerous secondary maxima of the J^-statistic in Doppler parameter space, 
a search for multiple sources needs to distinguish between true signals and secondary maxima 
associated with other, "louder" signals. Our method does this by applying a coincidence test to 
reject candidates which are not found at nearby parameter space positions in searches using each 
of the three TDI variables. For signals surviving the coincidence test, we perform a fully coherent 
search over a refined parameter grid to provide an accurate parameter estimation for the final 
candidates. Suitably tuned, the pipeline is able to extract 1989 true signals with only 5 false 
alarms. The use of the rigid adiabatic approximation allows recovery of signal parameters with 
errors comparable to statistical expectations, although there is still some systematic excess with 
respect to statistical errors expected from Gaussian noise. An experimental iterative pipeline with 
seven rounds of signal subtraction and re-analysis of the residuals allows us to increase the number 
of signals recovered to a total of 3419 with 29 false alarms. 
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I. INTRODUCTION 



The Mock LISA Data Challenges (MLDCs) [T] have the purpose of encouraging the devel- 
opment of LISA data-analysis tools and assessing the technical readiness of the community 
to perform gravitational- wave (GW) astronomy with LISA. The rounds completed so far 
have been labelled MLDCl [2], MLDC2 [3], MLDCIB and MLDC3 HE]. The chal- 
lenges have consisted of several data-sets containing different types of simulated sources and 
LISA noise, including quasi-periodic signals from white-dwarf binaries (WDBs). In this pa- 
per we describe an analysis performed on MLDC2 data, using an improved version of the 
pipeline that we originally applied in our MLDC2 entry [21 E] 

GW signals from WDBs will be long-lasting and (quasi-)monochromatic with slowly- 
varying intrinsic frequency /(t); in this sense they belong to the class of continuous GWs. 
In the case of ground-based detectors the typical sources of continuous GWs are spinning 
neutron stars with non-axisymmetric deformations. One of the standard tools developed 
for these searches is the J-'-statistic. We have applied this method in our MLDC searches, 
adapting the LAL/LALApps [7| search code ComputeFStatistic_v2 used within the LIGO 
Scientific Collaboration to search for periodic GW signals in data from ground-based detec- 
tors such as LIGO and GEO 600, e.g. see |8]. 

MLDCl and MLDCIB contained data sets with a relatively small number of simulated 
WDB signals, and the results of our searches on those data are reported elsewhere [U ITU] . 
The MLDC2 data-set contains a full simulated galaxy of WDB signals, with the challenge 
being to extract as many of these signals as possible. One approach, used by Crowder et 
al [HI [12], is to fit the overall signal with a multi-source template. Our analysis instead 
applies the traditional method of searching for individual sources. An important challenge 
in that regard is to distinguish secondary maxima in parameter space from primary peaks of 
true signals. We accomplish this through a hierarchical pipeline, which follows up candidates 
found in coincidence between searches carried out with different LISA observables. 

The plan for the rest of this paper is as follows: In section |TT] we review the fundamentals 
of the J-'-statistic search as applied to WDB signals in mock LISA data. In section III we 



describe our pipeline including the coincidence condition used to distinguish true signals from 
secondary maxima, and the estimation of expected statistical errors in the signal parameters. 



In section IV we describe some of the techniques used to evaluate the effectiveness of our 
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pipeline: the post-hoc classification of candidates into found signals and false alarms, and the 
discrepancies between the candidate parameters returned by our pipeline and the simulated 
values. In section |V] we describe the results of our pipeline in its optimal configuration 
and compare those with the results obtained using less sophisticated models of the LISA 



response. In section |VI| we present the results of an iterative program in which the signals 
found by the pipeline are subtracted from the data stream and then the pipeline is re-run 
on the residuals. 



II. SEARCH METHOD FOR CONTINUOUS SIGNALS FROM WHITE-DWARF 
BINARIES 



A. The J^-statistic 



The J-'-statistic was originally developed in [13], extended to the multi-detector case 
in [13], and generalized to the full Time-Domain Interferometry (TDI) [15j framework for 
LISA in [16]. The formalism for our application of this method to mock LISA data has 
been described in [D] and [TU], to which the reader is referred for details. Here we review the 
fundamentals of the method relevant to the current application. 

The signal received from a monochromatic GW source like a white-dwarf binary with 
negligible orbital evolution can be characterized by seven parameters. The three Doppler 
parameters are the intrinsic frequency / and two coordinates describing the sky location, 
such as galactic latitude (3 and longitude A, and can be denoted as 6^ = {/, (3, A}. The four 
amplitude parameters are the overall GW amplitude h^, the inclination angle l of the orbital 
plane, the polarization angle ip, and the initial phase 0o- One set of convenient combinations 
A^" = ^''(/io,i,^,0o) is 

A^ = cos 00 cos 2-0 — Ax sin 0o sin 2ip , (2.1a) 
A^ = A+ cos 00 sin 2ip + Ay, sin 0o cos 2ip , (2. lb) 

A^ = —A+ sin 00 cos 2tp — A^ cos 0o sin 2tp , (2.1c) 
A'^ = —A+ sin 00 sin 2-0 + Ay, cos0o cos 2-0 , (2- Id) 

where A^ = /io(l + cos^ l)/2 and A^ = Hq cost. Using these combinations, it is possible to 
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write the signal received in a detector I with instrumental noise n^{t) as 

x\t)=n\t)+A^h%e), (2.2) 

where we introduce the convention of an implicit sum X]|j=i ^^^^ repeated indices /x, z/, and 
the form of the template waveforms h^^{t\9) depends on the Doppler parameters and the 
specifics of the detector, such as orientation and motion as a function of time. 

Following the notation of p3l [T7] . we write the different data-streams x\t) as a vector 
x{t), and we define the standard multi-detector (with uncorrelated noise) scalar product as 

/oo 
^i\f)[SUf)r'fM)df. (2.3) 

Here we have broken up the observation time into intervals labelled by the Fourier- 

transform of the data in the ath time interval, x* denotes complex conjugation, and {Sai{f)} 
is the two-sided noise power spectral density appropriate to the ath time interval. We search 
for a signal {^s,^s} by seeking the parameters {Aci9c\ which maximize the log-likelihood 
ratio 

L{x- Ae) = {x\h)-]^{h\h)= A^'{x\h^) - ]^A^{h^\K)A'' . (2.4) 

Defining 

x^{e) = {x\h^), and M^M = {h^\K), (2.5) 

we see that L is maximized for given 9 by the amplitude estimator A'^ = Ai^^Xy, where M.^" 
is the inverse matrix of M.fj,y. Thus the detection statistic L, maximized over the amplitude 
parameters A is 

2J^{x-e) = x^,M^''x^,, (2.6) 

which defines the (multi-detector) J-'-statistic. One can show that the expectation in the 
perfect-match case 6' = 6's is E\2J^{6s)\ = 4 + |.4s|^, where we used the definition 

\A\^ = A^M^,{e,)A\ (2.7) 

for the norm of a 4-vector A^, using Aifj,^ as a metric on the amplitude-parameter space. 
Note that |^s| is the (optimal) signal-to-noise ratio (SNR) of the true signal {As^Os}. 
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FIG. 1: LISA configuration and TDI conventions used. 
B. Modelling the LISA response 

The MLDC data were generated by two different programs: Synthetic LISA [TH] simulates 
a detector output consisting of Doppler shifts of the LISA lasers due to relative motion of 
the spacecraft, while LISA Simulator [19] simulates the phase differences between laser light 
following different paths between the spacecraft.^ In both cases the underlying variables are 
combined with appropriate time shifts to form TDI observables which cancel the (otherwise 
dominating) laser frequency noise [151 [HI ISO]- One choice of such TDI quantities is the set 
of three observables {X,Y, Z}. These observables, which can be thought of as representing 

the output of three virtual "detectors" (which we label with the index /), are related to 

•f-> 

the gravitational wave tensor h through the detector "response", which can be modelled 
at different levels of accuracy. Our current approach uses the rigid adiabatic approxima- 
tion plj, but we also consider the long-wavelength limit (LWL). In the LWL approximation 
the reduced wavelength c/ (27r/) is assumed to be large compared to the distance L between 
the spacecraft, which corresponds to a light-travel time of T = L/c~ 17 s (assuming equal 
arm-lengths), and so this approximation requires / ^ lOmHz. These alternatives and their 
consequences are considered in more detail in [TO], but here we summarize the relevant 
approximations as they apply to our search. 

It is convenient to describe the "response" of a gravitational wave detector in the fre- 
quency domain in terms of a response function R{f), relating the detector output to a 

^ Our pipeline was constructed to handle either LISA simulator or synthetic LISA data, but for concreteness 
the results in this paper were all generated from the synthetic LISA data. 
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"strain" more closely connected to the metric perturbation tensor h, so that 
where : denotes the contraction of both tensor indices. In the long- wavelength limit, 

^synthLISA(^) ^ = ^ , (2.9a) 



and ~ '^lavl = (^2 ^ ^2 — ^ ^^3)/2 is the usual LWL response tensor for a GW 
interferometer with arms ^2 and ^3. The analogous expressions for Y and Z are obtained 
by cyclic permutations of the indices 1— 7'2— >3— t-I. In the remainder of this section we 
will give explicit expressions associated with the X variable, with the understanding that 
the formulas related to Y and Z can be constructed by analogy. 

A more accurate approximation to the TDI response is the so-called rigid adiabatic (RA) 
approximation [21J, which is valid in the regime where the finite lengths of data used to 
approximate the idealized Fourier transforms are short enough that the geometry and ori- 
entation of the detector doesn't change significantly during this time. In the RA formalism, 
the response is 

smc [271 J I j 

and, for a wave propagating along the unit vector k, 

3-(/J) = - t) . (2,11) 

where (defining ^(k) = (1 — k -n)) 

i2nfTk-n/3 

%n{f,k) = ^ {e*-mW sinc[7r/T^(-A?)] + e~^-i^^^-^') sinc[7r/Te(A?)]} (2.12) 

is a transfer function associated with the arm along n. Note that this is related to the 
7ji(/, k) defined in [21j by an overall phase, and also that T^(/, k) reduces to unity in the 
LWL / < l/(vrT). 

The input to the LAL/LALApps search code consists of Fourier-transformed data 
stretches of duration Tsft, referred to as Short Fourier Transforms (SFTs). This is a 
common data format used within the LIGO Scientific Collaboration for continuous-wave 
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full name 


label 


response 


detector tensor 


long- wavelength 
partial rigid adiabatic 
full rigid adiabatic 


LW 
pRA 
RA 


i?LWL(/) 

R{f) 
R{f) 


"lwl 

"LWL 



TABLE I: Definitions of the long- wavelength (LW), partial rigid adiabatic (pRA), and full rigid 
adiabatic (RA) formalisms, in terms of the response function R{f) (used to calibrate SFTs) and 
the detector tensor d {f, k). 

searches (e.g., see [22]). The time basehne TgpT has to be chosen sufficiently short such that 
the noise-floor can be approximated as stationary and the rotation and acceleration of the 
LISA detector can be neglected, and we chose Tsft = 7 days. 

We produce "calibrated SFTs" by Fourier-transforming the raw TDI data and applying 
a frequency-domain response function to produce a Fourier transformed strain 



J^(/)^i?(/)X(/) 



(2.13) 



For our MLDCl analysis [9] and MLDC2 submission [6] we used the long- wavelength approx- 
imation -Rlwl(/) for cahbrating SFTs, but for subsequent analyses (including our MLDCIB 
search [10]) we have produced "rigid adiabatic" SFTs, which use the full form of R{f) defined 



in (2.10). 



Our pipeline includes modifications to implement the full form of d {f,k). However, a 
logistically simpler intermediate approximation was also used in the initial foUowup to our 
MLDC2 work. In this "partial rigid adiabatic" (pRA) formalism, the more precise form of 



R{f) from (2.10) is used to construct the SFTs, but the further analysis proceeds with the 
simpler form of c^l^l- table 



II B 



for a summary of the three different levels of response 



approximation considered in this analysis. 



III. SEARCH ON MOCK LISA DATA 



A. Search Pipeline 

As in MLDCl [T7], we used the standard LAL/LALApps software [7] developed for 
the search for continuous GWs with ground-based detectors, in particular the code 
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FIG. 2: Doppler space structure of J-"-statistic in each of the three TDI variables {J-x-, ^ ^z) 
for a single source (indicated by the black cross) at /signal ~ 2.9044 mHz. Shown are points with 
IT > 20 over the whole sky and within a Frequency window of /signal ± 2 x 10~^/signai- White 
circles indicate local maxima in 2T. While the absolute maximum at the true signal parameters 
coincides in the three "single-detector" searches, many secondary maxima are found at different 
points in each of the three searches. This is the basis of the coincidence criterion used to distinguish 
between true signals and secondary maxima. 
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ComputeFStatistic_v2, which implements the multi-detector J-'-statistic (2.6). We ex- 
tended our LISA-specific generalizations of the code to allow analysis in either the long- 
wavelength or rigid adiabatic formalisms. 

Both single- and multi-detector J-'-statistic searches are complicated by the presence of 
secondary maxima in Doppler parameter space, i.e., points where J-" reaches a substantial 
local maximum, separated from the primary maximum at Doppler parameters of the true 
signal. This is illustrated in figure |2] for a search with only one injected signal. If only one 
signal is present, the global maximum of 2 J-" can be identified as the parameters of the true 
signal. Our original MLDCl search identified the loudest signal within a narrow frequency 
band as the true signal, but could not distinguish between secondary maxima due to that 
signal and weaker true signals nearby in frequency and at different points in the sky. 

In constructing our MLDC2 pipeline [6], we observed empirically that the same source 
tended to generate different patterns in secondary maxima across the sky in the TDI vari- 
ables X, Y, and Z. We thus identified "true" signal candidates by requiring them to have 
consistent Doppler parameters in single-detector searches performed using the X, Y, and Z 
observables. (The noise correlation among those three observables is irrelevant because this 
stage involves coincidences among the results of three single-detector searches rather than 
a coherent multi-detector search.) Note that a coherent multi-detector search involving X, 
Y and Z does not have this discriminating power, as it also yields a likelihood surface with 
primary and secondary maxima, similar to figure [2j The details of the coincidence criterion 
are discussed below, but it is based on requiring a low Doppler mismatch [T7] 

between candidates in different single-detector searches. No condition was placed on the 
consistency of the recovered amplitude parameters, in part because the LW searches in 
particular are known to produce unreliable Doppler parameters. Once signals had been 
identified in coincidence between pairs of single-detector searches, those candidates were fol- 
lowed up with finer-gridded single-detector searches, and candidates surviving in coincidence 
were then targeted with a coherent multi-detector search using the noise-independent TDI 
combinations X and Y — Z a.s the two "detectors" . 
The detailed pipeline was thus as follows: 

Stage One: Wide parameter-space single-detector searches using each of the TDI variables X, Y, 
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and Z. Up to = 100,000 Doppler parameter points with > 2Tth = 20 values 
are kept. Note that only one year of data is used in these initial single-detector stages. 
This was empirically found to cut down on false alarms. 

Stage Two: Identification of local maxima in each "detector" . A signal is a local maximum if there 



is no signal with higher 2^-" value at a Doppler mismatch (3.1 ) of m < ttilm = 2. 



Stage Three: Identification of coarse coincidences among the searches, using the coincidence con- 
dition m < mcoiNci = 0.8. For our search of MLDC2 data, a set of candidates in 
the three detectors was considered to be in coincidence if each pair of candidates was 
within the prescribed mismatch. 

Stage Four: Followup of initial candidates with finer-gridded single-detector searches and tighter 
coincidence. The search "zooms in" by iteratively increasing the resolution of the 
Doppler parameter grid for each detector. At the end of this process, coincidence 
among the detectors is checked again with the tighter condition m < mcoiNC2 = 0.2. 

Stage Five: Final multi-detector followup of surviving candidates. Now a multi-detector search is 
performed with the TDI combinations X and Y — Z, which have independent noise 
contributions. As in stage four, the search "zooms in" on the true Doppler parameters 
iteratively. The ultimate resolution of the search is set by this multi-detector search, 
which uses the full two years of data. 



B. Parameter Errorbars 



We estimated the errors expected from Gaussian fluctuations of the noise using the Fisher 
information matrices on the amplitude and Doppler parameter subspaces. For the amplitude 
parameters, the expected discrepancy = — -^s between the parameters Ac returned 
by the search and the true signal parameters is described by the expectation value 

E [AA^AA"] = M^'iO,) , (3.2) 

so we can quote an errorbar on a particular A'^ of 

a^^. = ^/M^^, (3.3) 
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with no implied sum over /i (we made no attempt to translate this back into errorbars on 
the physical parameters h^, cost, ip and 0o). 

Note, however, that this definition assumes that either the Doppler parameters are 
perfectly matched by the candidate (i.e., 6c = 6s) or there are no correlations between 
amplitude- and Doppler-coordinates in the full parameter-space Fisher matrix. In prac- 
tice none of these two conditions are satisfied in the present search. Therefore we expect 
deviations from these predicted error-distributions even in the case of perfectly Gaussian 
noise. 

For the Doppler parameters, the expected discrepancy A6^ = 61 — 61, between the Doppler 
parameters 61 of the primary 2J^ maximum and those 61 of the simulated signal is described 
by 

E [A^^A^^] = , (3.4) 
where F*-^ is the inverse of the Fisher information matrix 

fij = 9ij |-4.c| , (3-5) 
which can be defined in terms of the Doppler metric gij associated with the mismatch 



(3.1). Similar to the error-estimates on amplitudes, this definition assumes either perfectly 
matched amplitude parameters (i.e., Ac = As) or a block-diagonal Fisher matrix over the full 
parameter space, with no correlations between amplitude- and Doppler-space. In practice 
none of these conditions is true and we therefore expect deviations from the predicted error 
estimates. 

Rather than the full J-'-statistic metric, we use the approximate orbital metric [T7]. This 
metric is approximated having constant elements in terms of the coordinates {uq, k^,ky} 
where 

Uq = 2Txf , (3.6a) 
/Ca, = — 27r/-^ cos/3 cos A , (3.6b) 

fcy = — 27r/-^ cos /3 sin A . (3.6c) 

A limitation of the orbital metric is that it cannot distinguish between the points {/, /3, A} 
and {/, — /3, A} which are refiected through the ecliptic. (The search itself can distinguish 
between points with different signs of ecliptic latitude, thanks to the different amplitude 
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modulation, but this is not captured in the orbital metric.) Assigning error bars to the 
frequency / and ecliptic longitude A is straightforward, but converting an uncertainty in 
cos /3 into an uncertainty in ecliptic latitude ^ is complicated by the orbital metric becoming 
singular at the ecliptic. As a workaround, wc first calculate error bars 

2 (3.7a) 



cr 



2 d\ dX . 



ggi ggj (3-7c) 

and then estimate the error in (3 as 

(7/3 := cos~^(cos^ - (Jcos/j) - |/?| , (3.7d) 

so that 

cos(|/3|+(T;3) =cos^ + (7cos/3, (3.7e) 

i.e., we match the one-sigma equation in the direction away from the ecliptic, where the con- 
version between (3 and cos f3 should be well behaved. It is of course possible that cr^ > in 
which case the ±1(7 interval we define straddles the ecliptic, but this agrees qualitatively with 
the observation that some signals near the ecliptic are recovered in the opposite hemisphere. 



IV. EVALUATION 



A. Signal Identification 

When run on the MLDC 2.1 dataset, our pipeline returns ~ 2000 signals found in co- 
incidence. To evaluate its performance, wc check how many of those sources were found 
at parameters consistent with those of one of the galactic binary signals injected into the 
data. The original datasets were generated with ~ 30 million signals, but of those 59401 
were considered "bright" enough to detect by the MLDC Task Force and their parameters 
were placed into a separate key file. It is against that key that we compare our results. 

In part due to the known inaccuracies in amplitude parameters associated with the long- 
wavelength and partial rigid adiabatic responses, we checked for consistency using only the 
Doppler parameters (frequency and sky position). A signal was considered to be "found" if 
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the Doppler parameters of the candidate and the key had a mismatch 



m,, = g,^M'M^ (4.1) 

of 1 or less. (In the case of multiple injected and/or candidate signals satisfying the mismatch 
condition, the brightest were "paired off" first.) If no injected signal matched a candidate, 
that candidate was considered to be a "false alarm" . 



B. Parameter Errors 



These expectation values (3.2) and (3.4) allow us to define, as in pUj . 



(4.2) 



and 



so that 



— ^AA^AA" = 



E [eVi =l = E [e\] 



(4.3) 



(4.4) 



in the ideal case of statistical errors due to Gaussian noise, and no correlations between 
amplitude- and Doppler-parameters. 



V. RESULTS WITHOUT SIGNAL SUBTRACTION 



A. Signal Recovery 

The signal recovery of our pipeline using the various response models is summarized in 
table |V A[ Note that signal recovery is not significantly affected by the scalar response func- 



tion R{f)i so the pRA search performs almost identically to the LW one. (See section II B 



and table II B for the definitions of the different models of the LISA response.) However, 
the use of the full response tensor d{f, k) in the full RA search leads to an increase in the 
number of found signals from 1704 to 1989, with much of the improvement coming from 
higher-frequency signals, and a reduction in the number of false alarms. Note that while the 
designated "bright signals" key contained 59401 sources, many of those were still not bright 
by the standards of our search, having low values of |^s|^- Since the faintest signal found by 
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Freqs 


Signals 
{\Af> 40) 


] 

RA 


^ounc 
pRA 


I 

LW 


RA 


False 
pRA 


LW 


0-5 mHz 


4446 


1025 


984 


982 


2 


1 


1 


5-10 mHz 


1967 


822 


652 


652 


3 


5 


5 


10-15 mHz 


163 


133 


68 


68 





1 


1 


15-20 mHz 


7 


7 


2 


2 











20-27 mHz 


3 


2 











2 


2 


Total 


6586 


1989 


1706 


1704 


5 


9 


9 



TABLE II: Comparison of galactic WDB signal recovery with long- wavelength (LW), partial rigid 
adiabatic (pRA) and full rigid adiabatic (RA) response. The pRA and LW response functions differ 
only in the scalar piece R{f), and produce similarly efficient searches. The RA response, including 
the full response tensor d{f,k), leads to more found signals and fewer false alarms, especially at 
higher frequencies. 

our search had 2J^ = \Ac\ = 69.4, we focus attention on sources with lAr > 40, of which 
there are 6586 in the key file. 



B. Doppler parameter accuracy 



As described in section IV B[ we can compare the Doppler parameters 6c of each candidate 



with those 6s of the corresponding signal in the key. The errors, as a function of candidate 
frequency, are shown in figure [s] We show the errors for the searches using the full (RA) 
and partial (pRA) rigid adiabatic responses. The Doppler parameters returned using the 
long-wavelength (LW) limit are almost identical to the pRA results, so we omit those. We 



plot the combined measure eg defined in (4.2) (which has E [e^] = 1 in the case of Gaussian 



statistical errors), the error A/ in the measured frequency and the angle 0sky between the 
recovered and actual sky positions. The pRA Doppler errors are seen to be a bit larger 
than the RA Doppler errors. We can quantify this by making a cumulative histogram of the 



Doppler error measure eg defined in (4.2). For statistical errors arising from Gaussian noise 



and neglecting amplitude-Doppler correlations, Se^ should follow a central distribution 
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FIG. 3: Doppler parameter errors as a function of frequency. The left panel shows the error 



measure eg defined in (4.2) which is normalized to have expected unit RMS value in the case of 
perfectly matched amplitude parameters and statistical errors caused by Gaussian noise. The right 
panels show the errors in frequency and sky position. We show the errors for searches using the 
partial rigid adiabatic (pRA) and full rigid adiabatic (RA) responses. The Doppler errors using 
the naive long-wavelength limit are nearly identical to the pRA results and are therefore omitted. 

with three degrees of freedom, i.e., 




(5.1) 

The cumulative histograms of eg for the RA and pRA search, together with these theoretical 
expectations, are plotted in figure |4j 
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Cumulative histogram of doppler parameter errors 
3000 r ' ' ' ' 1 




0.3 1 3 10 



60 threshold 



FIG. 4: Cumulative histograms for Doppler parameter errors eg (see figure [3] for definitions). The 



dotted lines show the appropriately scaled theoretical CDF (5.1) in the case of perfectly matched 
amplitude parameters and statistical errors caused by Gaussian noise, where 3eg follows a central 
distribution with three degrees of freedom. 

C. Amplitude parameter accuracy 



As described in section IV B , we compare the vector of Amplitude parameters Ac of each 
candidate with those of the corresponding signal in the key. The natural geometrical 
structure for studying those vectors is the metric A^^^. (For concreteness we use J^^uiOs), 
i.e. evaluated at the Doppler parameters of the signal in the key). We can use the quantity 



defined in (4.3) (which has E [e^^] = 1 in the case of Gaussian statistical errors and perfectly 
matched Doppler parameters) as a measure of the overall discrepancy. We can also separate 
out the discrepancy in the length of the amplitude vectors from the angle between them in 
the four-dimensional amplitude parameter space, using quantities defined in [TO] : 

\A |2 _ I /I |2 _ 4 
^ 4 



is a measure of the amplitude discrepancy designed to have E [6j] = 0, while 

I Ac I I A; 
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cos-i ( '^!'!^r/^; ) (5.3) 



LW Amplitude Parameter EiTors 
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FIG. 5: Amplitude parameter errors using the long-wavelength limit (LWL) approximation, as 



a function of frequency. The left panel shows the error measure defined in (4.3) which is 
normalized to have expected unit RMS value in the case of statistical errors caused by Gaussian 
noise. The right panels show the errors in the magnitude and direction of the amplitude parameter 



vector, as measured by the amplitude difference 6_a defined in (5.2) and the angle between the 



true and recovered amplitude parameter vectors, defined in (5.3) Note the frequency-dependent 



systematic phase error and loss in SNR (5^ < 0), discussed in section VC 



measures the angle between the amplitude parameter vectors, and can be thought of as a 
phase discrepancy. Both and 0_4 have expected standard deviation l^sT^ in the case of 
Gaussian statistical errors. 

In figure [5} figure [6| and figure [7] we plot e^, 5^ and </)^ for the signals recovered in the 
LWL, pRA, and RA searches, respectively. The LWL results have substantial systematic 
errors in both 5_4 and 0^. The error in 0^, which increases linearly with frequency, turns 
out to be mostly due to a systematic error in the initial phase 0o corresponding to a time 



shift of 2T. This problem can be traced to the absence of the factor of e^^'^^'^ from (2.10) 
in the scalar response function -Rlwl(/) used in the LW search, and is fixed in the partial 
rigid adiabatic (pRA) approximation. There is also a systematic trend towards negative 5^, 
which, recalling that 2 J-" = \Ac\^ , corresponds to a signal being recovered with lower SNR 
than expected from the true amplitude parameters. Part of this effect is removed by the 



inclusion of sine (27r/T) in the denominator of (2.10), but the pRA results still show show 
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FIG. 6: Amplitude parameter errors using the partial rigid adiabatic (pRA) approximation, as 
a function of frequency. The quantities shown are as in figure [5j Note the frequency-dependent 



systematic loss in SNR (5^ < 0), discussed in section VC 



a frequency-dependent SNR deficiency. Both effects are absent in the full RA search. 

We can quantify the systematic errors, especially those still present in the RA search, 
by making a cumulative histogram of e_4. For statistical errors arising from Gaussian noise 
(and neglecting amplitude-Doppler correlations), 4eg should follow a central distribution 
with four degrees of freedom, i.e., 

P(e^ > e\) = e-'^- (l + 2e*/) . (5.4) 

The cumulative histograms of for the RA, pRA, and LWL searches, together with these 
theoretical expectations, are plotted in figure [8j 

D. Errors relative to estimated errorbars 

Another quantitative comparison of the size of the parameter errors to theoretical expec- 
tations is the ratio of the actual parameter errors A^'^ or A6^ to the errorbars cr_4M or a^i 
defined in section IIIB[ We plot these for the full rigid adiabatic search. In figure [9] we his- 



togram the relative frequency errors Af/af for the 1989 recovered signals. For comparison, 
we also plot the cumulative histogram for a standard normal distribution, and for one with 
a mean 0.01 and standard deviation of 1.68, the values measured from the actual Af/af 
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FIG. 7: Amplitude parameter errors using the full rigid adiabatic (RA) approximation, as a 
function of frequency. The quantities shown are as in figure [5] Note that the frequency-dependent 
systematic phase error seen in figure [5] and loss in SNR seen in figure [5] and figure [5] are absent, as 
discussed in section IV C[ 



distribution. In figure [TO] we make similar histograms of the errors in the galactic latitude 
P and longitude A. We see that while the errors in sky location are comparable in scale 
to the computed errorbars, the frequency errors are slightly larger than expected, with a 
standard deviation of 1.68 (rather than unity) on Af /aj. (This error is of comparable size 
in the RA, pRA, and LW searches.) For the amplitude parameters, we collect together 
{A^^/cr^M|yU = 1,2,3,4} for each of the 1989 recovered signals, and histogram those 7956 
values. We see a slight systematic excess in these errors, with a standard deviation of 1.56 
on the distribution of AA^/cj^t^ values. This is smaller than the corresponding value of 2.21 
from the pRA search and much smaller than the 15.22 found in the LW search. Replacing 
the long-wavelength approximation with the full rigid adiabatic response gives us amplitude 
parameters we can trust. 

VI. RESULTS WITH SIGNAL SUBTRACTION 

A known limitation of our search pipeline is that it identifies individual signals, treating 
all of the other signals as background noise. One proposed approach is to generate signals 
corresponding to the candidates returned by a search of the data, subtract those from the 
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FIG. 8: Cumulative histograms for Amplitude parameter errors (see figure [T] for definitions). 



The dotted lines show the appropriately scaled theoretical CDF (5.4) in the case of statistical 
errors caused by Gaussian noise (and neglecting amplitude-Doppler correlations), where 46^^ follows 
a central distribution with four degrees of freedom. The systematic errors illustrated by the 
non-Gaussian tails are greatly reduced by using the rigid adiabatic (RA) response. 

data, and re-run the search on the resulting residuals. (This approach is a pedestrian 
alternative to the multiple-signal templates used in pTl 112].) This approach is only likely 
to work in a search which returns reliable signal parameters, and so we did not attempt 
it with our original long-wavelength-approximation pipeline. However, since the pipeline 
using the rigid adiabatic response generates very few false alarms, and reasonable Doppler 
and amplitude parameter accuracy, we can use it for a simple, illustrative signal subtraction 
program. The algorithm we use is as follows: 



1. Run the original dataset through the standard pipeline described in section [ITT] to 
obtain a set of candidate signals. 

2. Invert the amplitude parameter vectors [A^ — )■ —A'^) of the candidate signals, and 
use those parameters to generate a composite signal to cancel out the signals found 
so far. For this step we used the lisatools [23] routines used to generate the original 
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FIG. 9: Cumulative histogram of frequency errors as a fraction of the errorbars estimated according 



to (3.7a), for the full RA search. We plot the histogram of the actual values, along with those for 
two Gaussian distributions: one with unit variance and zero mean (standard normal) and one with 
a mean of 0.01 and standard deviation of 1.68, equal to those in the actual distribution of A//crj 
values. 

challenge data, in particular the FastGalaxy code [21] and synthetic LISA [15]^. 

3. Add this cancellation data set to the original MLDC data and generate a new set of 
SFTs with the found signals subtracted out. 

4. Run the signal-subtracted dataset through the standard pipeline to obtain a set of 
"new" candidate signals. 

5. Compare the new signals with the signals already found to distinguish duplicates 
(corresponding to unmatched residual portions of signals) from truly new signals. 
This is done using the same matching criterion used to compare found signals with 



2 



This convenient use of existing infrastructure had the potential drawback that differences between these 
signal-generation algorithms and the signal models used in our search could lead to imperfections in the 
cancellation of signals. 



22 




FIG. 10: Cumulative histograms for errors in the recovered galactic latitude (3 and longitude A as 



a fraction of the errorbars estimated according to (3.7d) and (3.7b), for the full RA search, along 
with Gaussian distributions for comparison as in figure |9j 



23 




FIG. 11: Cumulative histogram of the errors in all amplitude parameters a fraction of the 



errorbars estimated according to (3.3), for the full RA search, along with Gaussian distributions 



for comparison as in figure 10 



a key, described in section |IV A[ In this step, duplicates are analogous to "matched" 



signals and truly new signals are analogous to "false alarms" . 

6. Combine the old and new signal lists to obtain a master list of signals found so far. 
Duplicates are only listed once, using the Doppler parameters with which they were 
found in the original search, and combined amplitude parameter vectors A'^^^ + ^^ew 

7. Repeat the process from step 2, using the new master list of signals found so far to 
subtract from the original dataset. 



The results of the iterative procedure are shown in table VI We performed seven rounds 
of iterative subtraction and re-analysis (stopping the process at that point because the last 
round only added two new signals). This procedure increased the number of "true" signals 
found from 1989 to 3419, but also increased the false alarm rate, with the number of false 
alarms going from 5 to 29. This increase in the false alarm rate is not surprising, since we 
are looking for the weaker signals that remain after the brightest ones have been removed. 
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Rounds of Subtraction 





1 


2 


3 


4 


5 


6 


7 


Found Signals 


1989 


2962 


3250 


3346 


3392 


3405 


3417 


3419 


False Alarms 


5 


23 


24 


27 


28 


28 


29 


29 



TABLE III: Results of iterative signal subtraction, described in section VI 



Search 


MTJPL 


PWAEI 


IMPAN 


UTB 


LW 


RA 


RAsubt 


Found Signals 


18084 


1766 


264 


281 


1704 


1989 


3419 


False Alarms 


1240 


11 


140 


3581 


9 


5 


29 



TABLE IV: Comparison of different entries in the Second Mock LISA Data Challenge, and results 
of the current pipeline. The MLDC2 entries, which are described in [3j and available from [25], 
are: "MTJPL" , a submission by Crowder et al [11^ [T2] using the Metropolis-Hastings Monte Carlo 
with a multi-signal template; "PWAEI", a submission by two of the present authors (Prix and 
Whelan) using a version of the present pipeline with the long- wavelength response [6] ; "IMPAN" , a 
submission by Krolak and Blaut using an J^-statistic method [16J which was refined for subsequent 
MLDC rounds |26| I27j : and "UTB", a submission by Nayak et al using a tomographic method 
to map the overall distribution of galactic binaries [28]. Included for comparison are "LW", the 
present search with the long-wavelength response; "RA" , the present search with the rigid adiabatic 



response; and "RAsubt" , the search described in section VI with seven rounds of signal subtraction. 



VII. COMPARISON TO ENTRIES IN THE SECOND MOCK LISA DATA CHAL- 
LENGE 



An earlier version of this pipeline, using the long-wavelength approximation, was used 
to generate our entry in the second Mock LISA Data Challenge [6]. Several other 
MLDC2 entries analyzed the same data set, as described in [3]. The parameters re- 
turned by those "blind" searches are recorded at [25j, and can be compared to the 



present pipeline using the criteria described in section IV A script to do this is in the 
MLDCevaluation/Galaxy_Evaluation/AEI directory of lisatools [23], and we have run this 
on the entries, along with suitably converted versions of the searches reported in section IVj 



and section VI For brevity we only report the numbers of false alarms and false dismissals, 
which are summarized in table IVIIi 
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Our original MLDC entry returned the second highest number of true signals. While the 
UTB entry included more candidate signals than ours, very few of them were associated with 



actual sources according to the method of section IV A Incidentally, the same qualitative 



information carried in the right-hand panel Figure 1 of [3j is reflected in table VII, which 
considers the match between Doppler parameters of candidate signals and sources in the 
key: the PWAEI candidates have the lowest false alarm percentage, most of the 18084 
MTJPL signals are real, and the UTB signals, in addition to having no amplitude parameter 
information and not determining the sign of the ecliptic latitude, have no discernible one- 
to-one correlation with the true sources. 

The results of our original MLDC entry ("PWAEI") are similar to the LW results re- 
ported in section |V} but not identical because the original search used the LISA simulator 
data, and the present search uses data from synthetic LISA. The addition of the rigid adi- 
abatic response makes a drastic improvement in amplitude parameter recovery as detailed 



in section VC, but also increases the number of recovered signals, as does the experimen- 



tal iterative signal subtraction technique detailed in section VI The various incarnations 
of our pipeline recovered between 1704 and 3419 of the 59401 "bright" signals present in 
the data set (which contained a further 30 million background signals assumed to be unde- 
tectable); note, however, that due to our current pipeline first-stage threshold of IT > 20 



(see Sec. Ill A) we only expect to be able to find at most 6586 or so signals with |^s|^ > 40. 
The MTJPL search by Crowder et al was able to recover 18084 true signals. This is still less 
than 59401, so the a priori assessment of "bright" signals should be taken with a grain of salt. 
However, the MTJPL pipeline also only missed 49 of those 6586 signals with |v4s|^ > 40, 
which indicates a higher intrinsic efficiency. 

One advantage of our pipeline is speed. The full search can run in a matter of hours using 
a few hundred nodes of a computing cluster. The iterative signal subtraction was run over 
the course of a week, but much of that was taken up in subtracting signals and re-starting 
the pipeline by hand. The speed will also be affected by the choice of first-stage threshold in 
our pipeline (currently 2 J-" > 20). We could increase the number of signals found by lowering 
this threshold, but the next-stage follow-up steps and signal-subtraction would then take 
more time. More work would be required to understand how much efficiency could be gained 
at what computing cost by lowering this threshold, and how it would affect the quality and 
reliablity of signal extraction. 
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VIII. CONCLUSIONS 



We have applied an J-'-statistic template bank search to mock LISA data containing 
a full galaxy of simulated white-dwarf binary systems. A multi-stage pipeline requiring 
Doppler parameter coincidence between searches using different TDI variables is effective in 
distinguishing true signals from false alarms and allows 1989 signals to be recovered with 
only 5 false alarms. 

The use of the rigid adiabatic model for LISA response, including a response tensor 
depending on signal frequency and sky direction, eliminates the systematic amplitude pa- 
rameter errors associated with searches using a long-wavelength approximation, and also 
allows more signals to be identified than with the simpler long-wavelength response tensor. 

The relatively accurate recovery of both amplitude and Doppler parameters allows an 
experimental implementation of an iterative signal subtraction pipeline; after seven rounds 
of signal subtraction and re-analysis, the number of found signals was increased to 3419, 
with a total of 29 false alarms. 
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